9.1 ISC-CI Model#

The ISC-CI model introducing a mechanism for context inference, based on the key assumption that temporal co-occurrence provides a useful basis for inferring shared context. Specifically, it assumes that (1) objects occurring together in a given context tend to share the properties elicited by that context; (2) these co-occurrence statistics are learned over the course of development; and (3) this implicit knowledge provides a basis for inferring, from a few examples of objects encountered in a new context, both which features are relevant in that context and what other objects are likely to occur in that context.

To make these ideas clear, consider the contexts in which you might encounter different kinds of birds: a bird-watching field trip in science class, a visit to the bird section of the zoo, and a picture book about birds. Each situation involves multiple types of birds (e.g., robins, crows, and ravens) and exposure to multiple bird-related properties (e.g., can-fly, eats-worms, is-bird) in various combinations. After these experiences, encountering a new context in which birds are relevant (e.g., learning that crows and ravens have hollow bones in the bird section of the Natural History museum) is likely to be interpreted as relating specifically to birds and their properties, implying that other birds like robins may also occur in this new context, and that they will share similar properties (e.g., robins also have hollow bones).

Conversely, contexts such as a science lesson on aerodynamics, a visit to a flight exhibit at a science museum, and a film on the history of flight are likely to involve multiple types of flying things (e.g., crows, airplanes, and butterflies) and flight-related properties (e.g., can-fly, has-wings, seen-in-the-sky). This suggests that a new context involving flying objects such as crows and airplanes (e.g., learning that crows and airplanes are associated with Bernoulli’s principle) likely relates to all things that can fly, implying that other flying things like butterflies may also occur in this new context and, again, share similar properties (e.g., butterflies are also associated with Bernoulli’s principle).

Thus, the properties shared by items encountered in a situation can provide a clue about what the current context is, what properties are currently important, and what other items are likely or unlikely also to be observed. The central hypothesis embodied by the ISC-CI model is that learning such environmental structure can support future inferences about which features might be relevant in novel contexts, based on the distribution of items that co-occur in those contexts. That is, observing that a new context involves a certain set of objects (e.g., both robins and airplanes) provides evidence that certain features will be context-relevant (e.g., can-fly and has-wings), but not others (e.g., lays-eggs), based on past experience.

Importantly, this process is graded and probabilistic rather than absolute, as any given set of objects can co-occur in different contexts at different frequencies. In particular, features that are broadly true of many objects are less likely to be relevant in a new context than features that are true of the more limited set of objects seen in that context (Griffiths et al., 2010; Xu & Tenenbaum, 2007). This is because there is a low likelihood of observing any particular set of objects in a broad context: there are many animals, but few Corvidae, so it is more likely that a context involving both crows and ravens relates to Corvidae specifically than it is that this context relates to animals in general. This is because the probability of observing both crows and ravens in the context of animals is lower than the probability of oberseving both crows and ravens in the context of Corvidae.

Setup and Installation:

%%capture
%pip install psyneulink

import psyneulink as pnl
import pandas as pd
import random

Generating the Training Data#

Feature Co-occurrences#

We design a training environment that simulates experiencing object co-occurrences throughout learning under the key assumption noted just above. The environment consisted of a series of episodes corresponding to different contexts. Each context involved a set of objects that share a common semantic feature (e.g., things that are birds, things that can fly, things that are found in the zoo, etc.), with each feature represented by a single output unit as implemented by the feature labels in the ISC-CI model.

We generate the episodes using the objects and features in the Leuven Concepts Database (De Deyne & Storms, 2008; Storms, 2001; Ruts et al., 2004). That database contains a matrix of binary judgments provided by human raters indicating, for each object-feature pairing, whether the object possess the feature (e.g., does a bear weigh more than 100lbs? Are kangaroos found in zoos?).

Let’s explore the dataset

FEATURE_PATH = 'https://raw.githubusercontent.com/PrincetonUniversity/NEU-PSY-502/refs/heads/main/data/isc_ci/features.csv'

feature_df = pd.read_csv(FEATURE_PATH, index_col=0)

feature_df.head()
is small is a bird is an animal is big can fly is an insect mammal is a fish lays eggs is brown ... is for all ages you can play different notes whit it can be used to put something in can be bought in sports store costs a lot of money drives above the ground driven by 1 person used in water used in the house worn often
monkey 0 0 1 0 0 0 1 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
beaver 0 0 1 0 0 0 1 0 0 1 ... 0 0 0 0 0 0 0 0 0 0
bison 0 0 1 1 0 0 1 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
dromedary 0 0 1 1 0 0 1 0 0 1 ... 0 0 0 0 0 0 0 0 0 0
squirrel 1 0 1 0 0 0 1 0 0 1 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 385 columns

🎯 Exercise 1a

As stated above, a context is defined by the common feature shared by the objects in that context. For this dataset, which objects might occur in the “eats mice” context?

✅ Solution

We can find the objects that might occur in the “eats mice” context by filtering the dataset for the feature “eats mice” and extracting the corresponding object names:

eats_mice = feature_df[feature_df['eats mice'] == 1].index.tolist()

🎯 Exercise 1b

As stated above, encountering different objects in the same episode provides evidence to the agent that they are in a specific context. For this dataset, which context might be defined by the object “owl” (and which by “falcon”)?

✅ Solution

The contexts are defined by the features of the objects:

contexts_owl = feature_df.columns[feature_df.loc['owl'] == 1]
contexts_falcon = feature_df.columns[feature_df.loc['falcon'] == 1]

print('Contexts defined by "owl":', contexts_owl)
print('Contexts defined by "falcon":', contexts_falcon)

🎯 Exercise 1c

Objects can appear in various contexts. So for two given objects, they might define multiple contexts. What possible contexts are defined by “owl” and “falcon” together?

✅ Solution

The possible contexts are defined by the features that are shared by “owl” and “falcon”:

contexts_both = feature_df.columns[(feature_df.loc['owl'] == 1) & (feature_df.loc['falcon'] == 1)]
print(contexts_both)

🎯 Exercise 1d

Now (using the theory from above), try to answer the following question: Given an agent experiences an episode with “owl” and “falcon” defining the context. Would the agent be more surprised by encountering a “cat” in this context or “penguin”?

(This is not a straight forward question, and you are not supposed to give a definite answer. Think about different (probabilistic/statistical) features of the world influencing the agent’s prediction).

Tip: Here we make the (unreasonable) assumption that contexts are uniformly distributed. In other words that any episode (being on a bird-watching tour or being in a science lesson) has occurred equally often in the past.

💡 Hint a

In the exercise above, we’ve seen that “owl” and “falcon” can elicit different contexts. But is there a way to “quantify” which of these contexts is more likely to be elicited?

Tip: This depends on how likely it is to encounter both an “owl” and a “falcon” in any of the given contexts.

💡 Hint b

We calculate the probabilities from above:

contexts_both = feature_df.columns[(feature_df.loc['owl'] == 1) & (feature_df.loc['falcon'] == 1)]

context_probabilities = {}

for c in contexts_both:
    objects = feature_df[feature_df[c] == 1].index.tolist()
    nr_objects = len(objects)
    # Simplification (objects are equally likely):
    # The probability for encountering any object is 1 / nr_objects
    # => the probability of 'drawing' two objects is 1 / nr_objects^2
    probability = 1 / nr_objects**2
    context_probabilities[c] = probability

print(context_probabilities)
print()
most_probabl_context = max(context_probabilities, key=context_probabilities.get)
print(most_probabl_context)
💡 Hint c

The code above shows that the most probable context is “eats mice”. But “penguins” don’t eat mice while “cats” do (You can convince yourself by querying the database :). So if the agent encounters a “cat” it should be less surprised than if it encounters a “penguin”.

However, this is just the most probable context. Although “penguin” doesn’t appear in the most probable context of “owl” and “falcon”, this might be offset by “penguin” appearing in more probable contexts than “cat”.

✅ Solution

First, we calculate the probabilities of encountering “owl” and “falcon” in any given context. We normalize these probabilties and use them as weights: We add all the probabilities of contexts where “penguin” also appears vs “cat”:

contexts_both = feature_df.columns[(feature_df.loc['owl'] == 1) & (feature_df.loc['falcon'] == 1)]

context_probabilities = {}

for c in contexts_both:
    objects = feature_df[feature_df[c] == 1].index.tolist()
    nr_objects = len(objects)
    probability = 1 / nr_objects**2
    context_probabilities[c] = probability

# normalize the to get the probabilty of a certain context beeing evoked
sum_probs = sum(context_probabilities.values())
normalized = {c: p / sum_probs for c, p in context_probabilities.items()}

penguin_weight = 0
penguin_nr = 0
cat_weight = 0
cat_nr = 0


for c, v in normalized.items():
    objects = feature_df[feature_df[c] == 1].index.tolist()
    # the context evokes penguin
    if 'penguin' in objects:
        penguin_weight += v
        penguin_nr += 1
    # the context evokes cat
    if 'cat' in objects:
        cat_weight += v
        cat_nr += 1

print(f'penguins({penguin_nr}): {penguin_weight}')
print(f'cats({cat_nr}): {cat_weight}')

Embedding#

The ISC-CI model uses as input a word embedding (not a one-hot encoding). This embedding can be interpreted as “context independent” representation. Here, we load the embeddings:

EMBEDDING_PATH = 'https://raw.githubusercontent.com/PrincetonUniversity/NEU-PSY-502/refs/heads/main/data/isc_ci/embeddings.csv'

embeddings_df = pd.read_csv(EMBEDDING_PATH, index_col=0)

chicken_embedding = embeddings_df.loc['chicken']
chicken_embedding
0     0.119731
1     0.987248
2     0.982023
3     0.410961
4     0.451753
        ...   
59    0.022675
60    0.326714
61    0.014851
62    0.035483
63    0.982558
Name: chicken, Length: 64, dtype: float64

🎯 Exercise 2

We can ask the same question as above: Can you think about a way of using the embeddings of “owl”, “falcon”, “cat” and “penguin” to quantify weather the agent would be more surprised by encountering a “cat” or a “penguin” in the context of “owl” and “falcon”?

💡 Hint

With word embeddings, we can calculate similarities between words by using the distance.

✅ Solution

We calculate the distance between the embeddings of “owl” and “falcon” to the embeddings of “cat” and “penguin”:

owl_emb = embeddings_df.loc['owl']
falcon_emb = embeddings_df.loc['falcon']
cat_emb = embeddings_df.loc['cat']
penguin_emb = embeddings_df.loc['penguin']

owl_falcon_emb = (owl_emb + falcon_emb) / 2

cat_distance = ((owl_falcon_emb - cat_emb)**2).sum()
penguin_distance = ((owl_falcon_emb - penguin_emb)**2).sum()

print('cat distance ', cat_distance)
print('penguin distance ', penguin_distance)

Note, the “context”-calculations and “embedding”-calculations lead to different predictions:

  • context -> “cat” is less surprising in “owl”, “falcon” context

  • embedding -> “penguin” is less surprising in “owl”, “falcon” context

Training Data#

Now, we construct a set of episodes by uniformly sampling from the set of features with replacement, so that each episode involved one shared semantic feature that defined the associated context. Given this feature, we generated a support set by uniformly sampling two items sharing the feature, and a query set by uniformly sampling one additional item sharing the feature (positive) and one that is not sharing the feature (negative). Note, that as above-mentioned, any given set of objects can co-occur in multiple contexts (in other words can share multiple features), however, for features tha are broadly true of many objects, it is less likely for each specific pair of objects to be sampled. For example, if the feature is “eats mice”, the likelihood of sampling exactly {“owl”, “falcon”} is higher than if the feature is “is a bird” (since there are many more birds than mice eaters).

def get_random_episode():
    # randomly pick a context (feature)
    feature = random.choice(feature_df.columns)

    # one hot encoded feature vector is our context:
    context_label = [0] * len(feature_df.columns)
    context_label[feature_df.columns.get_loc(feature)] = 1

    # get objects that have this feature by name
    objects_included = feature_df[feature_df[feature] == 1].index.tolist()

    # get objects that don't have this feature
    objects_excluded = feature_df[feature_df[feature] == 0].index.tolist()

    # randomly pick two supports (can be the same twice)
    support = random.choices(objects_included, k=2)

    # the support vector is the embedding of the two supports
    support_vector_1 = embeddings_df.loc[support[0]]
    support_vector_2 = embeddings_df.loc[support[1]]

    # decide weather to pick an object that is in the context or not
    choice = random.choice([0, 1])

    if choice == 0:
        # pick an object that is not in the context
        query = random.choice(objects_excluded)
    else:
        # pick an object that is not in the context
        query = random.choice(objects_included)

    # the query vector is the embedding of the query
    query_vector = embeddings_df.loc[query]

    return {
        'context_label': context_label,
        'support_1': support_vector_1,
        'support_2': support_vector_2,
        'query': query_vector,
        'response': [1, 0] if choice > .5 else [0, 1]
    }

Training Loop#

The model is now trained on 20,000 episodes.

Note, that this is not enough for the model to converge and just illustrates how a training loop would look like. We don’t expect reasonable predictions after training.

tot_reps = 1
examples_per_rep = 20_000
n_epochs = 1

for reps in range(tot_reps):

    inputs_dict = {
        query: [],
        support_1: [],
        support_2: [],
    }

    targets_dict = {
        context_label: [],
        response: [],
    }

    # Get Trainign
    i = 0
    while i < examples_per_rep:
        example = get_random_episode()
        # Append the input to the dictionary
        inputs_dict[query].append(example['query'])
        inputs_dict[support_1].append(example['support_1'])
        inputs_dict[support_2].append(example['support_2'])

        # Append the targets to the dictionary
        targets_dict[context_label].append(example['context_label'])
        targets_dict[response].append(example['response'])
        i += 1

    # Train the network for `n_epochs`
    result = isc_ci_model.learn(
        inputs={
            'inputs': inputs_dict,
            'targets': targets_dict,
            'epochs': n_epochs
        },
        execution_mode=pnl.ExecutionMode.PyTorch
    )
    # Print a dot for each repetition to track progress
    print('.', end='')
    # Print a new line every 10 repetitions
    if (reps + 1) % 10 == 0:
        print()
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
Cell In[7], line 33
     30     i += 1
     32 # Train the network for `n_epochs`
---> 33 result = isc_ci_model.learn(
     34     inputs={
     35         'inputs': inputs_dict,
     36         'targets': targets_dict,
     37         'epochs': n_epochs
     38     },
     39     execution_mode=pnl.ExecutionMode.PyTorch
     40 )
     41 # Print a dot for each repetition to track progress
     42 print('.', end='')

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/psyneulink/core/globals/context.py:742, in handle_external_context.<locals>.decorator.<locals>.wrapper(context, *args, **kwargs)
    739             pass
    741 try:
--> 742     return func(*args, context=context, **kwargs)
    743 except TypeError as e:
    744     # context parameter may be passed as a positional arg
    745     if (
    746         f"{func.__name__}() got multiple values for argument"
    747         not in str(e)
    748     ):

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/psyneulink/library/compositions/autodiffcomposition.py:1381, in AutodiffComposition.learn(self, synch_projection_matrices_with_torch, synch_node_variables_with_torch, synch_node_values_with_torch, synch_results_with_torch, retain_torch_trained_outputs, retain_torch_targets, retain_torch_losses, *args, **kwargs)
   1376 if execution_mode == pnlvm.ExecutionMode.PyTorch and not torch_available:
   1377     raise AutodiffCompositionError(f"'{self.name}.learn()' has been called with ExecutionMode.Pytorch, "
   1378                                    f"but Pytorch module ('torch') is not installed. "
   1379                                    f"Please install it with `pip install torch` or `pip3 install torch`")
-> 1381 return super().learn(*args,
   1382                      synch_with_pnl_options=synch_with_pnl_options,
   1383                      retain_in_pnl_options=retain_in_pnl_options,
   1384                      execution_mode=execution_mode,
   1385                      **kwargs)

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/psyneulink/core/globals/context.py:742, in handle_external_context.<locals>.decorator.<locals>.wrapper(context, *args, **kwargs)
    739             pass
    741 try:
--> 742     return func(*args, context=context, **kwargs)
    743 except TypeError as e:
    744     # context parameter may be passed as a positional arg
    745     if (
    746         f"{func.__name__}() got multiple values for argument"
    747         not in str(e)
    748     ):

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/psyneulink/core/compositions/composition.py:11784, in Composition.learn(self, inputs, targets, num_trials, epochs, learning_rate, minibatch_size, optimizations_per_minibatch, patience, min_delta, execution_mode, randomize_minibatches, call_before_minibatch, call_after_minibatch, context, *args, **kwargs)
  11781 self._check_nested_target_mechs()
  11782 context.execution_phase = execution_phase_at_entry
> 11784 result = runner.run_learning(
  11785     inputs=inputs,
  11786     targets=targets,
  11787     num_trials=num_trials,
  11788     epochs=epochs,
  11789     learning_rate=learning_rate,
  11790     minibatch_size=minibatch_size
  11791                     or self.parameters.minibatch_size._get(context)
  11792                     or self.parameters.minibatch_size.default_value,
  11793     optimizations_per_minibatch=optimizations_per_minibatch
  11794                                 or self.parameters.optimizations_per_minibatch._get(context)
  11795                                 or self.parameters.optimizations_per_minibatch.default_value,
  11796     patience=patience,
  11797     min_delta=min_delta,
  11798     randomize_minibatches=randomize_minibatches,
  11799     call_before_minibatch=call_before_minibatch,
  11800     call_after_minibatch=call_after_minibatch,
  11801     context=context,
  11802     execution_mode=execution_mode,
  11803     *args, **kwargs)
  11805 context.remove_flag(ContextFlags.LEARNING_MODE)
  11806 return result

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/psyneulink/library/compositions/compositionrunner.py:360, in CompositionRunner.run_learning(self, inputs, targets, num_trials, epochs, learning_rate, minibatch_size, optimizations_per_minibatch, patience, min_delta, randomize_minibatches, synch_with_pnl_options, retain_in_pnl_options, call_before_minibatch, call_after_minibatch, context, execution_mode, **kwargs)
    357 if not callable(stim_input) and 'epochs' in stim_input:
    358     stim_epoch = stim_input['epochs']
--> 360 stim_input, num_input_trials = self._composition._parse_learning_spec(inputs=stim_input,
    361                                                                       targets=stim_target,
    362                                                                       execution_mode=execution_mode,
    363                                                                       context=context)
    364 if num_trials is None:
    365     num_trials = num_input_trials

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/psyneulink/library/compositions/autodiffcomposition.py:1289, in AutodiffComposition._parse_learning_spec(self, inputs, targets, execution_mode, context)
   1288 def _parse_learning_spec(self, inputs, targets, execution_mode, context):
-> 1289     stim_input, num_input_trials = super()._parse_learning_spec(inputs, targets, execution_mode, context)
   1291     if not callable(inputs):
   1292         input_ports_for_INPUT_Nodes = self._get_input_receivers()

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/psyneulink/core/compositions/composition.py:10128, in Composition._parse_learning_spec(self, inputs, targets, execution_mode, context)
  10124     inputs = _recursive_update(inputs, targets)
  10126 # 3) Resize inputs to be of the form [[[]]],
  10127 # where each level corresponds to: <TRIALS <PORTS <INPUTS> > >
> 10128 inputs, num_inputs_sets = self._parse_input_dict(inputs, context)
  10130 return inputs, num_inputs_sets

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/psyneulink/core/compositions/composition.py:10290, in Composition._parse_input_dict(self, inputs, context)
  10288 _inputs = self._parse_labels(_inputs)
  10289 self._validate_input_dict_keys(_inputs)
> 10290 _inputs = self._instantiate_input_dict(_inputs)
  10291 _inputs = self._flatten_nested_dicts(_inputs)
  10292 _inputs = self._validate_input_shapes_and_expand_for_all_trials(_inputs)

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/psyneulink/core/compositions/composition.py:10521, in Composition._instantiate_input_dict(self, inputs)
  10519 else:
  10520     node_name = node_spec.full_name
> 10521 error_base_msg = f"Input for '{node_name}' of {self.name} ({_inputs}) "
  10523 if isinstance(_inputs, dict):
  10524     # entry is dict for a nested Composition, which will be handled recursively
  10525     pass

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/pandas/core/series.py:1784, in Series.__repr__(self)
   1782 # pylint: disable=invalid-repr-returned
   1783 repr_params = fmt.get_series_repr_params()
-> 1784 return self.to_string(**repr_params)

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/pandas/core/series.py:1883, in Series.to_string(self, buf, na_rep, float_format, header, index, length, dtype, name, max_rows, min_rows)
   1831 """
   1832 Render a string representation of the Series.
   1833 
   (...)   1869 '0    1\\n1    2\\n2    3'
   1870 """
   1871 formatter = fmt.SeriesFormatter(
   1872     self,
   1873     name=name,
   (...)   1881     max_rows=max_rows,
   1882 )
-> 1883 result = formatter.to_string()
   1885 # catch contract violations
   1886 if not isinstance(result, str):

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/pandas/io/formats/format.py:320, in SeriesFormatter.to_string(self)
    318 else:
    319     fmt_index = index._format_flat(include_name=True)
--> 320 fmt_values = self._get_formatted_values()
    322 if self.is_truncated_vertically:
    323     n_header_rows = 0

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/pandas/io/formats/format.py:297, in SeriesFormatter._get_formatted_values(self)
    296 def _get_formatted_values(self) -> list[str]:
--> 297     return format_array(
    298         self.tr_series._values,
    299         None,
    300         float_format=self.float_format,
    301         na_rep=self.na_rep,
    302         leading_space=self.index,
    303     )

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/pandas/io/formats/format.py:1161, in format_array(values, formatter, float_format, na_rep, digits, space, justify, decimal, leading_space, quoting, fallback_formatter)
   1145     digits = get_option("display.precision")
   1147 fmt_obj = fmt_klass(
   1148     values,
   1149     digits=digits,
   (...)   1158     fallback_formatter=fallback_formatter,
   1159 )
-> 1161 return fmt_obj.get_result()

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/pandas/io/formats/format.py:1194, in _GenericArrayFormatter.get_result(self)
   1193 def get_result(self) -> list[str]:
-> 1194     fmt_values = self._format_strings()
   1195     return _make_fixed_width(fmt_values, self.justify)

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/pandas/io/formats/format.py:1472, in FloatArrayFormatter._format_strings(self)
   1471 def _format_strings(self) -> list[str]:
-> 1472     return list(self.get_result_as_array())

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/pandas/io/formats/format.py:1459, in FloatArrayFormatter.get_result_as_array(self)
   1455 # this is pretty arbitrary for now
   1456 # large values: more that 8 characters including decimal symbol
   1457 # and first digit, hence > 1e6
   1458 has_large_values = (abs_vals > 1e6).any()
-> 1459 has_small_values = ((abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)).any()
   1461 if has_small_values or (too_long and has_large_values):
   1462     if self.leading_space is True:

File /opt/hostedtoolcache/Python/3.11.12/x64/lib/python3.11/site-packages/numpy/core/_methods.py:55, in _any(a, axis, dtype, out, keepdims, where)
     51 def _prod(a, axis=None, dtype=None, out=None, keepdims=False,
     52           initial=_NoValue, where=True):
     53     return umr_prod(a, axis, dtype, out, keepdims, initial, where)
---> 55 def _any(a, axis=None, dtype=None, out=None, keepdims=False, *, where=True):
     56     # Parsing keyword arguments is currently fairly slow, so avoid it for now
     57     if where is True:
     58         return umr_any(a, axis, dtype, out, keepdims)

KeyboardInterrupt: 

Testing#

embedding_falcon = embeddings_df.loc['falcon']
embedding_owl = embeddings_df.loc['owl']

embedding_cat = embeddings_df.loc['cat']
embedding_penguin = embeddings_df.loc['penguin']
res_cat = isc_ci_model.run(
    {
        query: embedding_cat,
        support_1: embedding_falcon,
        support_2: embedding_owl,
    })
response.value
array([[0.4971299 , 0.50272648]])
res_penguin = isc_ci_model.run(
    {
        query: embedding_penguin,
        support_1: embedding_falcon,
        support_2: embedding_owl,
    })
response.value
array([[0.47386775, 0.52613219]])

🎯 Exercise 5

Since the training was not sufficient, the model will not be able to predict the correct response. However, can you explain what the expected outcome for the above support and query items would be?

✅ Solution

The expected outcome for the above support and query items would be:

Higher split between yes and no for the query cat than for the query penguin.

🎯 Exercise 6

Even with a larger training set, the model is unlikely to “see” each combination of support and query items for each possible context. How can it still learn contexts for unseen combinations?

💡 Hint

Why are we using context independent layers in the first place? What would happen if we just use one-hot encodings and skip these independent layer?

✅ Solution

The embeddings make it so that the model generalizes to unseen combinations since they might be “similar” to seen combinations. For example, birds share features in embedding space. So even if the model hasn’t seen specific bird combinations it still learns the context that is elicited by these combinations.

If we just use one-hot encodings and skip the context independent layer, the model would not be able to generalize to unseen combinations since they don’t share any activations (everything is orthogonal). This would mean that with enough training, the model would maybe still learn to make correct predictions but never on unseen combinations.